🛢️ PGK 👶 🐍 workshop

Introduction to python data analysis with an example analysing whether there is any correlation between seismicity and production/injection in an oil field.

To run this script, you need a Python 3.x environment with numpy, pandas and plotly.

There are multiple ways to install Python on your system, but for this workshop the below steps are the easiest.

  1. Install the Anaconda python environment from the Anaconda website. This includes python, Jupyter notebooks, and a set of pre-installed packages for data analysis and machine learning.
  2. The plotly package does not come pre-installed, but can be added by:
    1. Starting Anaconda Prompt (search for anaconda prompt in your windows program folder)
    2. A window opens showing something like (base) C:\>.
    3. Type conda install plotly and press enter, the plotly library and other dependencies will be automatically installed (if this does not work, try pip install plotly instead).
    4. Close the window once done.

🖥️ Running python code

In this workshop we use jupyter notebooks to easily run code and show results. You can use any text editor to write python scripts, but the interactive environment of these notebooks is a good place to start when you're new to Python. Note that Anaconda comes with a pre-installed python code editor called Spyder and with jupyter notebooks. Both can be found under the Anaconda program folder. To run jupyter:

  1. Look for/search for jupyter in your programs folder (it's installed under Anaconda).
  2. A juypter command window will open, and shortly after a browser window will be opened that brings you to the jupyter home page.
  3. From this page, you can start a new notebook via the New button (top right), or you can upload and open an existing notebook file using Upload.
  4. You can also use Upload to upload data files that you are using in your scripts.

For more info see https://jupyter-notebook.readthedocs.io/en/stable/

🥅 of this workshop

Using spreadsheet data as a basis, we will do some basic data processing and analysis to look into a microseismic dataset and a set of wells to see if there is any pattern between production/injection and seismicity. For data analysis we make use of pandas, a popular library for processing structured data (i.e. spreadsheets) and for plotting we make use of plotly, which is optimized for plotting pandas objects.

The aim of this workshop is to demonstrate some of the data processing and visualization powers of python and to show specifically some of the advantages of python over Excel.


1. Getting started: Python module imports

By default a python session only contains the very basic functionality. You can import the specific functionality you need by importing the relevant packages. This ensures that your python session is fit-for-purpose without using too many system resources.

In this case, we will be importing pandas for data handling and part of the plotly library for plotting. Note that we can rename the packages we import for easier use.

Once we've imported a package, we can use its functions by calling the package followed by a dot and the function. For example, to use the csv file importer in pandas, type pd.read_csv('filename.csv').

In [1]:
import numpy as np  # array operations
import pandas as pd # dataframe operations
import plotly.express as px # plotting dataframes

2. Analyzing spreadsheet data: Microseismic data

Using pandas, spreadsheet data can be quickly imported into the python environment. Both Excel files and csv files are supported, but we will be using only csv files.

Using pandas, we import a catalogue of microseismic events (date, time, location (x,y,z coordinates and event magnitude). We store this information using the variable microseismic. In a jupyter notebook, you can easily print the data by simply writing the variable name. In 'real' python files, this will not work (use print(microseismic) instead).

In [2]:
microseismic = pd.read_csv("microseismic.csv")
microseismic
Out[2]:
Date Time Northing [m] Easting [m] Depth_SS [m] Moment Magnitude
0 5-Apr-11 20:10:26 -624.25 -242.03 -2228.14 1.720
1 5-Apr-11 20:10:38 -509.00 -223.47 -2326.84 1.680
2 6-Apr-11 10:18:40 -299.75 -205.41 -2253.77 1.800
3 6-Apr-11 10:18:44 -308.50 -218.72 -2239.68 1.730
4 6-Apr-11 10:18:47 -309.00 -185.37 -2244.70 1.810
... ... ... ... ... ... ...
12953 10-Jan-19 16:26:16 -21.00 49.00 -2241.00 1.004
12954 10-Jan-19 16:26:16 -17.00 44.00 -2276.00 1.106
12955 10-Jan-19 16:45:50 12.00 119.00 -2325.00 1.403
12956 10-Jan-19 17:55:45 194.00 236.00 -2295.00 1.096
12957 10-Jan-19 20:14:07 162.00 6.00 -2372.00 1.555

12958 rows × 6 columns

Each event is stored in a row (12958 events in total). In this case the csv file has the column titles in the first row, and these are automatically assigned to column titles in the dataframe. You can select a column using these names, e.g. microseismic['Moment Magnitude'] will return the column with moment magnitudes.

There is a lot more that can be customized when importing csv files, including using multi-level headers, automatic formatting and data conversions, filtering invalid data etc etc. To learn more there are many useful resources online, for example https://www.datacamp.com/community/tutorials/pandas-read-csv.

There is also a column with date information. This is simply text data, but this can be used to change the table into a timeseries object, which gives a lot of extra flexibility to filtering and editing the data. To convert this table into a timeseries object, use pd.to_datetime(dataframe)

In [3]:
microseismic.index = pd.to_datetime(microseismic['Date'])
microseismic
Out[3]:
Date Time Northing [m] Easting [m] Depth_SS [m] Moment Magnitude
Date
2011-04-05 5-Apr-11 20:10:26 -624.25 -242.03 -2228.14 1.720
2011-04-05 5-Apr-11 20:10:38 -509.00 -223.47 -2326.84 1.680
2011-04-06 6-Apr-11 10:18:40 -299.75 -205.41 -2253.77 1.800
2011-04-06 6-Apr-11 10:18:44 -308.50 -218.72 -2239.68 1.730
2011-04-06 6-Apr-11 10:18:47 -309.00 -185.37 -2244.70 1.810
... ... ... ... ... ... ...
2019-01-10 10-Jan-19 16:26:16 -21.00 49.00 -2241.00 1.004
2019-01-10 10-Jan-19 16:26:16 -17.00 44.00 -2276.00 1.106
2019-01-10 10-Jan-19 16:45:50 12.00 119.00 -2325.00 1.403
2019-01-10 10-Jan-19 17:55:45 194.00 236.00 -2295.00 1.096
2019-01-10 10-Jan-19 20:14:07 162.00 6.00 -2372.00 1.555

12958 rows × 6 columns

Still looks pretty much the same, but now we can quickly answer questions like: How many events did we have in the year 2013?

In [4]:
microseismic['2013-01-01':'2013-12-31']
Out[4]:
Date Time Northing [m] Easting [m] Depth_SS [m] Moment Magnitude
Date
2013-01-01 1-Jan-13 12:37:13 155.75 260.56 -2255.68 1.63
2013-01-01 1-Jan-13 16:53:38 268.00 294.16 -2273.05 1.83
2013-01-01 1-Jan-13 20:11:53 -549.75 -453.09 -2228.17 2.27
2013-01-03 3-Jan-13 19:54:41 161.50 254.59 -2288.99 1.65
2013-01-03 3-Jan-13 19:55:08 165.75 270.34 -2300.80 1.52
... ... ... ... ... ... ...
2013-12-30 30-Dec-13 21:58:09 87.00 -1039.94 -2312.31 2.42
2013-12-30 30-Dec-13 22:35:43 -611.00 -252.31 -2199.94 1.54
2013-12-31 31-Dec-13 6:13:31 367.50 291.75 -2314.08 2.27
2013-12-31 31-Dec-13 17:03:14 175.50 228.47 -2301.97 1.46
2013-12-31 31-Dec-13 20:15:05 -530.00 -348.06 -2243.93 2.09

1802 rows × 6 columns

Using the column names, we can also easily filter the data based on other properties, for example listing only events above a certain moment magnitude.

In [5]:
microseismic[microseismic['Moment Magnitude']>3]
Out[5]:
Date Time Northing [m] Easting [m] Depth_SS [m] Moment Magnitude
Date
2013-02-12 12-Feb-13 7:18:21 237.00 154.84 -2314.05 3.050
2014-01-15 15-Jan-14 14:56:46 377.25 178.44 -2313.50 3.020
2014-01-15 15-Jan-14 14:56:48 299.75 45.28 -2313.85 3.310
2015-09-28 28-Sep-15 12:03:19 -352.25 -1499.75 -2238.18 3.150
2017-07-23 23-Jul-17 17:05:17 652.25 -214.91 -2412.52 3.030
2017-09-23 23-Sep-17 7:32:18 -139.75 -273.78 -2314.81 3.010
2018-07-09 9-Jul-18 17:28:00 -196.00 -250.00 -2328.00 3.041

3. Plotting dataframes - timeseries

Python has many plotting libraries, the most oldskool one being matplotlib, which is a 1-1 port of the plotting functionality in Matlab. However, python has much more powerful and easier-to-use plotting libraries, and plotly express in particular is extremely easy to use in combination with pandas.

Using plotly express, we can plot the timeseries and use the available columns in the dataframe to define plot properties.

In [6]:
# For reference, an illustration of how boring matplotlib is
import matplotlib.pyplot as plt
plt.scatter(microseismic['Easting [m]'],microseismic['Northing [m]'], label='Microseismic events')
Out[6]:
<matplotlib.collections.PathCollection at 0x2b0ef5f5b88>
In [7]:
px.scatter(microseismic, x='Date', y='Moment Magnitude',title='Microseismic')
In [8]:
# Pretty boring.. Excel can do that too. Now let's color the points by depth
px.scatter(microseismic, x='Date', y='Moment Magnitude',color='Depth_SS [m]',opacity=.5, title='Microseismic')

Looks nice, but this does not provide much information yet beyond that there are many events. However, since this is a timeseries we can also easily resample the data to look at event frequency per month for example.

In [9]:
px.histogram(microseismic, x='Date', title='Events/day')

We can also make the figure a bit more interactive to easily look at specific date intervals. For this we need the 'parent' library of plotly express.

In [10]:
import plotly.graph_objects as go
fig = px.histogram(microseismic, x='Date',title='Events/day')

fig.update_xaxes(rangeslider_visible=True)
fig.show()

4. Map view, 3-D and 4-D

We can also inspect the spatial distribution of seismicity by plotting events in a map view, still using px.scatter but now with the x and y coordinates as x and y axes. In addition, we can specify point size and color as a function of microseismic attributes, e.g.:

  • marker color indicating event depth
  • marker size shows event magnitude
In [11]:
px.scatter(microseismic, x='Easting [m]', y='Northing [m]', color='Depth_SS [m]', size='Moment Magnitude', size_max=10, opacity=.5, title='Map view seismicity')

Lots of overlapping points.. what does this look like in 3D?

3-D scatter plots can be made using px.scatter_3d instead of px.scatter, with the extra parameter z='Depth_SS [m].

In [12]:
px.scatter_3d(
    microseismic, 
    x='Easting [m]', 
    y='Northing [m]', 
    z='Depth_SS [m]', 
    color='Depth_SS [m]', 
    size='Moment Magnitude', 
    size_max=10, 
    opacity=.5, 
    title='3D view seismicity')

This shows some degree of event clustering, but plotting 8 years of events in 1 map does not help in showing where the real clusters are.

We can also show the distribution of seismicity per year or per month. This requires a new column that gives for each event the month and year in which it occurred. Since the microseismic data is a time series object, we can easily group data using .to_period('M').

For the grouping to work, the time format needs to be changed to a text label, which is done with the .index.strftime command.

To have a '4-D' plot, we use the same px.scatter function as before, but now with 3 additions:

  • animation_frame='monthyear' to define which column to use for grouping in time
  • range_x and range_y to fix the spatial extent of the plot
In [13]:
microseismic['monthyear'] = microseismic.to_period('M').index.strftime("%Y-%m-%d")
px.scatter(microseismic, 
           x='Easting [m]', 
           y='Northing [m]', 
           color='Depth_SS [m]', 
           size='Moment Magnitude', 
           size_max=10, 
           opacity=.5, 
           animation_frame='monthyear',
           range_x=[min(microseismic['Easting [m]']),max(microseismic['Easting [m]'])],
           range_y=[min(microseismic['Northing [m]']),max(microseismic['Northing [m]'])],
          title='Seismicity timelapse')

The same can be done in 3-D using again px.scatter_3d and adding a z= parameter.

In [14]:
px.scatter_3d(microseismic, 
           x='Easting [m]', 
           y='Northing [m]', 
           z='Depth_SS [m]',
           color='Depth_SS [m]', 
           size='Moment Magnitude', 
           size_max=10, 
           opacity=.5, 
           animation_frame='monthyear',
           range_x=[min(microseismic['Easting [m]']),max(microseismic['Easting [m]'])],
           range_y=[min(microseismic['Northing [m]']),max(microseismic['Northing [m]'])],
           range_z=[min(microseismic['Depth_SS [m]']),max(microseismic['Depth_SS [m]'])],
             title='3-D seismicity timelapse')

To keep better track of the 'center of gravity' of microseismic events, use animation_group, which calculates the average of each time step.

In [15]:
px.scatter(microseismic, 
           x='Easting [m]', 
           y='Northing [m]', 
           color='Depth_SS [m]', 
           size='Moment Magnitude', 
           size_max=30, 
           opacity=.5, 
           animation_frame='monthyear',
           animation_group='monthyear',
           range_x=[min(microseismic['Easting [m]']),max(microseismic['Easting [m]'])],
           range_y=[min(microseismic['Northing [m]']),max(microseismic['Northing [m]'])],
          title='Seismicity focal point')

Another way of looking at spatial clustering is using px.density_contour or px.density_heatmap.

In [16]:
px.density_contour(microseismic, 
           x='Easting [m]', 
           y='Northing [m]', 
           animation_frame='monthyear',
           animation_group='monthyear',
           range_x=[min(microseismic['Easting [m]']),max(microseismic['Easting [m]'])],
           range_y=[min(microseismic['Northing [m]']),max(microseismic['Northing [m]'])],
           title='Seismicity contour map')

5. Well location data

Now that we have a reasonable understanding of what the microseismic data looks like, we can bring in the well data, which is stored in files:

  • well_locations.csv contains the well names, types and (TD) coordinates.
  • well_rates.csv contains for each well and each time step the injection/production volumes
In [17]:
well_locations = pd.read_csv("well_locations.csv")
well_locations
Out[17]:
Type Name x y z
0 Steam injector PGKYP117H1 146.3084 -202.991 -2321.827151
1 Steam injector PGKYP118H1 -227.6311 61.847 -2300.730740
2 Steam injector PGKYP119H2 290.7779 -0.810 -2314.625180
3 Steam injector PGKYP120H1 396.5311 519.811 -2312.246780
4 Steam injector PGKYP123H1 -689.5582 -15.152 -2306.458000
... ... ... ... ... ...
65 Producer PGKYP99H4 -632.3642 -1446.487 -2393.273108
66 Water injector PGKYP121H1 -632.8220 -95.544 -2098.473485
67 Water injector PGKYP124H2 -1301.7707 257.947 -2693.706605
68 Water injector PGKYP40H2 409.4545 -1031.604 -2590.693974
69 Water injector PGKYP42H2 -1750.9297 -732.331 -2562.187867

70 rows × 5 columns

There are 3 types of wells. Using the column Type with the keyword facet_col, the locations of these 3 well types can be plotted separately.

In [18]:
px.scatter(well_locations, x='x', y='y', color='z', hover_name='Name', facet_col='Type',title='Well locations')

The well locations can be shown in 3-D and compared with the distribution of seismic events.

This requires combining 2 separate dataframes into 1 figure, for which we need a Plotly Figure object. The details of the code below are outside the scope of this short session.

In [19]:
fig=go.Figure()
fig.add_trace(px.scatter_3d(microseismic,x='Easting [m]', y='Northing [m]', z='Depth_SS [m]', color='Depth_SS [m]', size='Moment Magnitude', size_max=10).data[0])
fig.add_trace(px.scatter_3d(well_locations, x='x', y='y', z='z', color='z', hover_name='Name').data[0])
fig.show()

6. Well volumes

Now we can see which wells are close to clusters of seismicity, but to study the relation between seismicity and production/injection, we need to know the injected/produced volumes per well. This information is stored in a separate spreadsheet, which contains monthly produced/injected volumes for oil, water and steam.

Each row lists these volumes for a given month and a given well.

In [20]:
well_volumes = pd.read_csv("well_volumes.csv")
well_volumes
Out[20]:
HOLE_NAME START_DATE OIL WATER WATER_INJECTION STEAM_INJECTION
0 PGKYPWDW-10H1 11/1/2012 0:00 0.000000 0.000000 0.000000 0.0
1 PGKYPWDW-10H1 8/1/2013 0:00 0.000000 0.000000 74.414478 0.0
2 PGKYPWDW-10H1 9/1/2013 0:00 0.000000 0.000000 72.343845 0.0
3 PGKYPWDW-10H1 10/1/2013 0:00 0.000000 0.000000 69.903877 0.0
4 PGKYPWDW-10H1 11/1/2013 0:00 0.000000 0.000000 55.692608 0.0
... ... ... ... ... ... ...
5593 PGKYP-99H4 2/1/2018 0:00 234.977304 35.371540 0.000000 0.0
5594 PGKYP-99H4 3/1/2018 0:00 223.623477 19.146902 0.000000 0.0
5595 PGKYP-99H4 4/1/2018 0:00 183.160144 57.907862 0.000000 0.0
5596 PGKYP-99H4 5/1/2018 0:00 149.141098 66.104463 0.000000 0.0
5597 PGKYP-99H4 6/1/2018 0:00 216.081794 56.676412 0.000000 0.0

5598 rows × 6 columns

To get an idea of what type of data is here, it's good practice to start with plotting the data. In Excel, this would be complicated since all data for all wells are stored in a single table. However, with pandas and plotly we can group the data per well while plotting.

We now use px.line to plot rates as lines, which works similar as px.scatter.

As an example, we plot oil production per well. To get the plot to show each well, we use the color keyword. When continuous data is assigned to this keyword, data will be coloured by value but not grouped, as we could see when plotting for example seismic event depth. However, when giving categorical data to the color keyword, data is automatically grouped.

As we are plotting oil production, we want to plot only the producer wells, which can be done by filtering the dataframe based on oil volume injected: well_volumes[well_volumes['OIL']>0]

In [21]:
well_volumes['START_DATE'] = pd.to_datetime(well_volumes['START_DATE'])
px.line(well_volumes[well_volumes['OIL']>0],x='START_DATE',y='OIL',color='HOLE_NAME',title='Oil production per well')

7. Field-wide comparison of seismicity and production/injection

Using the well_volumes data, we can investigate whether there is a relation between seismicity and injection/production using field-wide averages.

The well volume data lists produced oil and water volumes and injected steam and water volumes. For comparing volumes to seismicity, it is helpful to also have the total injected/produced volumes. We add a few columns, which can be done in different ways:

  • simply add values of 1 column to values of another column using +
  • select a subset of columns and sum those using well_volumes[['col1','col2']].sum(axis=1) (note the double [[ ]]!).

We also make a column of the total, where injected volumes are negative, and produced volumes positive.

In [22]:
well_volumes['INJECTED'] = well_volumes['STEAM_INJECTION'] + well_volumes['WATER_INJECTION']
well_volumes['PRODUCED'] = well_volumes[['OIL','WATER']].sum(axis=1)
well_volumes['TOTAL'] = -well_volumes['INJECTED'] + well_volumes['PRODUCED']
well_volumes
Out[22]:
HOLE_NAME START_DATE OIL WATER WATER_INJECTION STEAM_INJECTION INJECTED PRODUCED TOTAL
0 PGKYPWDW-10H1 2012-11-01 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000
1 PGKYPWDW-10H1 2013-08-01 0.000000 0.000000 74.414478 0.0 74.414478 0.000000 -74.414478
2 PGKYPWDW-10H1 2013-09-01 0.000000 0.000000 72.343845 0.0 72.343845 0.000000 -72.343845
3 PGKYPWDW-10H1 2013-10-01 0.000000 0.000000 69.903877 0.0 69.903877 0.000000 -69.903877
4 PGKYPWDW-10H1 2013-11-01 0.000000 0.000000 55.692608 0.0 55.692608 0.000000 -55.692608
... ... ... ... ... ... ... ... ... ...
5593 PGKYP-99H4 2018-02-01 234.977304 35.371540 0.000000 0.0 0.000000 270.348844 270.348844
5594 PGKYP-99H4 2018-03-01 223.623477 19.146902 0.000000 0.0 0.000000 242.770378 242.770378
5595 PGKYP-99H4 2018-04-01 183.160144 57.907862 0.000000 0.0 0.000000 241.068006 241.068006
5596 PGKYP-99H4 2018-05-01 149.141098 66.104463 0.000000 0.0 0.000000 215.245561 215.245561
5597 PGKYP-99H4 2018-06-01 216.081794 56.676412 0.000000 0.0 0.000000 272.758206 272.758206

5598 rows × 9 columns

Now we can also have a look at the field-wide injected and produced volumes as a function of time. In the table above, we have for each well for each time step the total injected. Using groupby(by='column_name') we can get the sum of all wells per time step.

In [23]:
well_volumes.groupby(by='START_DATE').sum()
Out[23]:
OIL WATER WATER_INJECTION STEAM_INJECTION INJECTED PRODUCED TOTAL
START_DATE
2012-01-01 0.000000 959.146124 184.278407 971.118277 1155.396683 959.146124 -196.250559
2012-02-01 0.000000 665.887506 180.456563 483.752163 664.208726 665.887506 1.678780
2012-03-01 39.660938 1137.822782 205.846713 5383.659239 5589.505952 1177.483720 -4412.022232
2012-04-01 63.196739 719.590003 184.264260 2152.679063 2336.943324 782.786742 -1554.156582
2012-05-01 296.441737 811.474937 200.551303 2495.198590 2695.749893 1107.916673 -1587.833219
... ... ... ... ... ... ... ...
2018-02-01 1854.109062 1603.582853 3548.103489 3381.685486 6929.788975 3457.691915 -3472.097061
2018-03-01 2332.843244 1860.421113 4029.201785 4034.626339 8063.828124 4193.264357 -3870.563767
2018-04-01 2144.132674 1668.301704 3730.819868 4527.666458 8258.486326 3812.434378 -4446.051948
2018-05-01 1841.282702 1835.139582 4187.238508 4824.265862 9011.504371 3676.422284 -5335.082087
2018-06-01 2001.685749 1869.300855 4172.318724 4972.219665 9144.538389 3870.986604 -5273.551785

78 rows × 7 columns

Plot the totals as a function of time, where < 0 means that at that time, there was more injected than produced.

In [24]:
fieldwide_volumes = well_volumes.groupby(by='START_DATE').sum()
fieldwide_volumes['date'] = fieldwide_volumes.index
px.line(fieldwide_volumes,x='date',y='TOTAL',title='Field-wide injection/production balance')

We now have field-wide volumes per month and seismicity per month. To make a 1-1 comparison between these, we put them into 1 dataframe.

Since the well volumes are per month, we aggregate the seismic events per month as well by using .resample('M'), where M is for monthly resampling (D for per-day, W for per-week).

In [25]:
microseismic['Events'] = 1 # add a field for storing event counts
ms_fieldwide = microseismic[['Events']].resample('M').count() # resample microseismic to events per month
ms_fieldwide
Out[25]:
Events
Date
2011-04-30 30
2011-05-31 21
2011-06-30 27
2011-07-31 16
2011-08-31 15
... ...
2018-09-30 207
2018-10-31 216
2018-11-30 334
2018-12-31 145
2019-01-31 81

94 rows × 1 columns

Now seismic data and well volumes are per month, so they can be merged together. To do so, however, we need to slightly reformat the dates, since microseismic dates are the last days of the month whereas well volumes are listed by the first day of each month.

Since both objects are timeseries, we can use .to_period('M') to create an index for both in the same format of Year-Month.

In [26]:
ms_fieldwide.index = ms_fieldwide.to_period('M').index
fieldwide_volumes.index = fieldwide_volumes.to_period('M').index
ms_fieldwide
Out[26]:
Events
Date
2011-04 30
2011-05 21
2011-06 27
2011-07 16
2011-08 15
... ...
2018-09 207
2018-10 216
2018-11 334
2018-12 145
2019-01 81

94 rows × 1 columns

Now seismicity per month can be added to fieldwide_volumes using pd.merge(), which can be used to merge any 2 dataframes. We merge by index values, using the keywords left_index and right_index.

In [27]:
merged = pd.merge(fieldwide_volumes, ms_fieldwide, left_index=True, right_index=True)
merged
Out[27]:
OIL WATER WATER_INJECTION STEAM_INJECTION INJECTED PRODUCED TOTAL date Events
START_DATE
2012-01 0.000000 959.146124 184.278407 971.118277 1155.396683 959.146124 -196.250559 2012-01-01 108
2012-02 0.000000 665.887506 180.456563 483.752163 664.208726 665.887506 1.678780 2012-02-01 187
2012-03 39.660938 1137.822782 205.846713 5383.659239 5589.505952 1177.483720 -4412.022232 2012-03-01 254
2012-04 63.196739 719.590003 184.264260 2152.679063 2336.943324 782.786742 -1554.156582 2012-04-01 153
2012-05 296.441737 811.474937 200.551303 2495.198590 2695.749893 1107.916673 -1587.833219 2012-05-01 211
... ... ... ... ... ... ... ... ... ...
2018-02 1854.109062 1603.582853 3548.103489 3381.685486 6929.788975 3457.691915 -3472.097061 2018-02-01 86
2018-03 2332.843244 1860.421113 4029.201785 4034.626339 8063.828124 4193.264357 -3870.563767 2018-03-01 61
2018-04 2144.132674 1668.301704 3730.819868 4527.666458 8258.486326 3812.434378 -4446.051948 2018-04-01 39
2018-05 1841.282702 1835.139582 4187.238508 4824.265862 9011.504371 3676.422284 -5335.082087 2018-05-01 53
2018-06 2001.685749 1869.300855 4172.318724 4972.219665 9144.538389 3870.986604 -5273.551785 2018-06-01 79

78 rows × 9 columns

We can also look at the injected and produced volumes of the different fluids. We already have different columns for each fluid, but to plot these different fluid volumes in 1 figure, the table needs some reshaping using .melt()

In [28]:
melted = merged.melt(id_vars='date')
melted
Out[28]:
date variable value
0 2012-01-01 OIL 0.000000
1 2012-02-01 OIL 0.000000
2 2012-03-01 OIL 39.660938
3 2012-04-01 OIL 63.196739
4 2012-05-01 OIL 296.441737
... ... ... ...
619 2018-02-01 Events 86.000000
620 2018-03-01 Events 61.000000
621 2018-04-01 Events 39.000000
622 2018-05-01 Events 53.000000
623 2018-06-01 Events 79.000000

624 rows × 3 columns

In [29]:
px.line(melted,x='date',y='value',color='variable', title='Field-wide rates')

Ok, here we reach a limit in what we can do with plotly.express, and we need to call in plotly itself to plot seismicity on a secondary y-axis to better see the relation between seismicity and injection/production volumes.

Using plotly requires a bit more code and might take a bit more time to get used to, but it is extremely powerful and versatile. See for more details and examples the plotly website.

In [30]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=merged['date'],y=merged['OIL'],name='Oil',yaxis='y1'))
fig.add_trace(go.Scatter(x=merged['date'],y=merged['WATER'],name='Water',yaxis='y1'))
fig.add_trace(go.Scatter(x=merged['date'],y=merged['WATER_INJECTION'],name='Water injection',yaxis='y1'))
fig.add_trace(go.Scatter(x=merged['date'],y=merged['STEAM_INJECTION'],name='Steam injection',yaxis='y1'))
fig.add_trace(go.Scatter(x=merged['date'],y=merged['INJECTED'],name='Total injection',yaxis='y1'))
fig.add_trace(go.Scatter(x=merged['date'],y=merged['PRODUCED'],name='Total production',yaxis='y1'))
fig.add_trace(go.Scatter(x=merged['date'],y=merged['Events'],name='Events',yaxis='y2',line=dict(dash='dash',color='black')))
fig.update_layout(
        yaxis1=dict(
            title="Injected/produced volumes [m3]",
        ),
        yaxis2=dict(
            title="Events",
            anchor='free',
            overlaying='y',
            side='right',
            position=1
        ),
    legend_orientation='h',
    title='Field-wide seismicity vs. injection/production'
)
fig.show()

The above figure helps to qualitatively assess whether there is a relation between well volumes and seismicity. Calculating the correlation coefficient between these parameters provides a more quantitative measure. Below are some examples.

In [31]:
px.scatter_matrix(merged)
In [32]:
correlation_matrix = merged.corr()
correlation_matrix
Out[32]:
OIL WATER WATER_INJECTION STEAM_INJECTION INJECTED PRODUCED TOTAL Events
OIL 1.000000 0.611674 0.476816 0.345068 0.510539 0.946198 -0.198869 -0.163674
WATER 0.611674 1.000000 0.843450 0.549111 0.874091 0.834758 -0.720034 -0.201171
WATER_INJECTION 0.476816 0.843450 1.000000 0.356428 0.896866 0.676863 -0.825630 -0.189625
STEAM_INJECTION 0.345068 0.549111 0.356428 1.000000 0.732921 0.464772 -0.717418 -0.086175
INJECTED 0.510539 0.874091 0.896866 0.732921 1.000000 0.712868 -0.940791 -0.178867
PRODUCED 0.946198 0.834758 0.676863 0.464772 0.712868 1.000000 -0.432929 -0.196203
TOTAL -0.198869 -0.720034 -0.825630 -0.717418 -0.940791 -0.432929 1.000000 0.135072
Events -0.163674 -0.201171 -0.189625 -0.086175 -0.178867 -0.196203 0.135072 1.000000
In [33]:
mat = correlation_matrix.values
mat[mat==1] = np.nan
mat[np.tril_indices(mat.shape[0], -1)] = np.nan
fig = go.Figure(
        data=go.Heatmap(
            z=mat, # 
            x=correlation_matrix.columns,
            y=correlation_matrix.columns,
            zmin=-1,
            zmid=0,
            zmax=1,
            # hoverongaps = False,
            colorscale='RdBu',
            reversescale=True,
            colorbar=dict(
                nticks=3,
                ticktext=["negative","no corr.","positive"],
                tickmode='array',
                tickvals=[-1,0,1],
            ),
        )
    )
fig.update_layout(
    title='Correlation heatmap',
    height=600,
    width=600
)
fig.show()

8. Per-well comparison of injected/produced volumes vs. seismicity

From the previous figure showing field-wide volumes and seismicity rates, there is no convincing relation. However, seismicity might be triggered by local injection or production - there was some clustering in the spatial distribution of seismic events - and these fieldwide aggregate values hide any local variations.

In this part, we calculate the total injected/produced data per well through the following steps:

  1. Define a common well identifier column for the well_locations dataframe and the well_volumes dataframe.
  2. Use groupby() to group the volumes per well.
  3. Merge the locations and grouped well volumes data.

Step 1a: Create a well identifier and assign it as index for the well location dataframe.

In [34]:
well_locations['w_id'] = well_locations['Name'].str.replace('PGKYP','')
well_locations.index = well_locations['w_id']
well_locations
Out[34]:
Type Name x y z w_id
w_id
117H1 Steam injector PGKYP117H1 146.3084 -202.991 -2321.827151 117H1
118H1 Steam injector PGKYP118H1 -227.6311 61.847 -2300.730740 118H1
119H2 Steam injector PGKYP119H2 290.7779 -0.810 -2314.625180 119H2
120H1 Steam injector PGKYP120H1 396.5311 519.811 -2312.246780 120H1
123H1 Steam injector PGKYP123H1 -689.5582 -15.152 -2306.458000 123H1
... ... ... ... ... ... ...
99H4 Producer PGKYP99H4 -632.3642 -1446.487 -2393.273108 99H4
121H1 Water injector PGKYP121H1 -632.8220 -95.544 -2098.473485 121H1
124H2 Water injector PGKYP124H2 -1301.7707 257.947 -2693.706605 124H2
40H2 Water injector PGKYP40H2 409.4545 -1031.604 -2590.693974 40H2
42H2 Water injector PGKYP42H2 -1750.9297 -732.331 -2562.187867 42H2

70 rows × 6 columns

Step 1b: Do the same for the well_volumes dataframe:

In [35]:
well_volumes[['prefix','w_id']] = well_volumes['HOLE_NAME'].str.split('-',expand=True)
well_volumes
Out[35]:
HOLE_NAME START_DATE OIL WATER WATER_INJECTION STEAM_INJECTION INJECTED PRODUCED TOTAL prefix w_id
0 PGKYPWDW-10H1 2012-11-01 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 PGKYPWDW 10H1
1 PGKYPWDW-10H1 2013-08-01 0.000000 0.000000 74.414478 0.0 74.414478 0.000000 -74.414478 PGKYPWDW 10H1
2 PGKYPWDW-10H1 2013-09-01 0.000000 0.000000 72.343845 0.0 72.343845 0.000000 -72.343845 PGKYPWDW 10H1
3 PGKYPWDW-10H1 2013-10-01 0.000000 0.000000 69.903877 0.0 69.903877 0.000000 -69.903877 PGKYPWDW 10H1
4 PGKYPWDW-10H1 2013-11-01 0.000000 0.000000 55.692608 0.0 55.692608 0.000000 -55.692608 PGKYPWDW 10H1
... ... ... ... ... ... ... ... ... ... ... ...
5593 PGKYP-99H4 2018-02-01 234.977304 35.371540 0.000000 0.0 0.000000 270.348844 270.348844 PGKYP 99H4
5594 PGKYP-99H4 2018-03-01 223.623477 19.146902 0.000000 0.0 0.000000 242.770378 242.770378 PGKYP 99H4
5595 PGKYP-99H4 2018-04-01 183.160144 57.907862 0.000000 0.0 0.000000 241.068006 241.068006 PGKYP 99H4
5596 PGKYP-99H4 2018-05-01 149.141098 66.104463 0.000000 0.0 0.000000 215.245561 215.245561 PGKYP 99H4
5597 PGKYP-99H4 2018-06-01 216.081794 56.676412 0.000000 0.0 0.000000 272.758206 272.758206 PGKYP 99H4

5598 rows × 11 columns

Step 2: Group the data to get cumulative volumes per well

In [36]:
summed_volumes = well_volumes.groupby(by='w_id').sum()
summed_volumes
Out[36]:
OIL WATER WATER_INJECTION STEAM_INJECTION INJECTED PRODUCED TOTAL
w_id
100H3 5852.343383 1192.962732 0.000000 0.000000 0.000000 7045.306115 7045.306115
101H1 5140.634483 1606.105089 0.000000 0.000000 0.000000 6746.739572 6746.739572
102H1 3412.342287 1999.246464 0.000000 0.000000 0.000000 5411.588751 5411.588751
103H1 5605.095523 1303.932602 0.000000 0.000000 0.000000 6909.028125 6909.028125
104H1 1567.512775 1058.450072 0.000000 0.000000 0.000000 2625.962848 2625.962848
... ... ... ... ... ... ... ...
96H1 0.000000 0.000000 0.000000 1912.101953 1912.101953 0.000000 -1912.101953
97H1 0.000000 0.000000 0.000000 18349.760004 18349.760004 0.000000 -18349.760004
98H1 3094.501446 1882.229155 0.000000 0.000000 0.000000 4976.730601 4976.730601
99H4 5836.563511 2386.582716 0.000000 0.000000 0.000000 8223.146227 8223.146227
9H1 0.000000 0.000000 3998.952895 0.000000 3998.952895 0.000000 -3998.952895

106 rows × 7 columns

Step 3: Merge well volumes with well locations

In [37]:
wells = pd.merge(well_locations, summed_volumes, left_index=True, right_index=True)
wells
Out[37]:
Type Name x y z w_id OIL WATER WATER_INJECTION STEAM_INJECTION INJECTED PRODUCED TOTAL
w_id
117H1 Steam injector PGKYP117H1 146.3084 -202.991 -2321.827151 117H1 0.000000 0.000000 0.0 9525.094246 9525.094246 0.000000 -9525.094246
118H1 Steam injector PGKYP118H1 -227.6311 61.847 -2300.730740 118H1 0.000000 0.000000 0.0 2826.525173 2826.525173 0.000000 -2826.525173
119H2 Steam injector PGKYP119H2 290.7779 -0.810 -2314.625180 119H2 0.000000 0.000000 0.0 9118.505463 9118.505463 0.000000 -9118.505463
120H1 Steam injector PGKYP120H1 396.5311 519.811 -2312.246780 120H1 0.000000 0.000000 0.0 4176.984592 4176.984592 0.000000 -4176.984592
123H1 Steam injector PGKYP123H1 -689.5582 -15.152 -2306.458000 123H1 0.000000 0.000000 0.0 13897.880596 13897.880596 0.000000 -13897.880596
... ... ... ... ... ... ... ... ... ... ... ... ... ...
99H4 Producer PGKYP99H4 -632.3642 -1446.487 -2393.273108 99H4 5836.563511 2386.582716 0.0 0.000000 0.000000 8223.146227 8223.146227
121H1 Water injector PGKYP121H1 -632.8220 -95.544 -2098.473485 121H1 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000
124H2 Water injector PGKYP124H2 -1301.7707 257.947 -2693.706605 124H2 0.000000 771.286964 0.0 0.000000 0.000000 771.286964 771.286964
40H2 Water injector PGKYP40H2 409.4545 -1031.604 -2590.693974 40H2 0.000000 386.234927 0.0 0.000000 0.000000 386.234927 386.234927
42H2 Water injector PGKYP42H2 -1750.9297 -732.331 -2562.187867 42H2 0.000000 7271.926400 0.0 0.000000 0.000000 7271.926400 7271.926400

70 rows × 13 columns

In [38]:
fig=go.Figure()
fig.add_trace(px.scatter_3d(microseismic,x='Easting [m]', y='Northing [m]', z='Depth_SS [m]', size='Moment Magnitude', size_max=10).data[0])
fig.add_trace(px.scatter_3d(wells, x='x', y='y', z='z', color='INJECTED', size='INJECTED', hover_name='Name').data[0])
fig.add_trace(px.scatter_3d(wells, x='x', y='y', z='z', color='PRODUCED', size='PRODUCED', hover_name='Name').data[0])
fig.show()

9. Correlation between well volumes and seismicity

The final step is to further quantify the observations from the 3-D plot above by calculating the correlation coefficient between the volume injected/produced in each well and the number of seismic events.

For this we go back to well_volumes and we use pivot_table() to get a table with a row for each time step and a column for each well.

In [39]:
well_volumes
Out[39]:
HOLE_NAME START_DATE OIL WATER WATER_INJECTION STEAM_INJECTION INJECTED PRODUCED TOTAL prefix w_id
0 PGKYPWDW-10H1 2012-11-01 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 PGKYPWDW 10H1
1 PGKYPWDW-10H1 2013-08-01 0.000000 0.000000 74.414478 0.0 74.414478 0.000000 -74.414478 PGKYPWDW 10H1
2 PGKYPWDW-10H1 2013-09-01 0.000000 0.000000 72.343845 0.0 72.343845 0.000000 -72.343845 PGKYPWDW 10H1
3 PGKYPWDW-10H1 2013-10-01 0.000000 0.000000 69.903877 0.0 69.903877 0.000000 -69.903877 PGKYPWDW 10H1
4 PGKYPWDW-10H1 2013-11-01 0.000000 0.000000 55.692608 0.0 55.692608 0.000000 -55.692608 PGKYPWDW 10H1
... ... ... ... ... ... ... ... ... ... ... ...
5593 PGKYP-99H4 2018-02-01 234.977304 35.371540 0.000000 0.0 0.000000 270.348844 270.348844 PGKYP 99H4
5594 PGKYP-99H4 2018-03-01 223.623477 19.146902 0.000000 0.0 0.000000 242.770378 242.770378 PGKYP 99H4
5595 PGKYP-99H4 2018-04-01 183.160144 57.907862 0.000000 0.0 0.000000 241.068006 241.068006 PGKYP 99H4
5596 PGKYP-99H4 2018-05-01 149.141098 66.104463 0.000000 0.0 0.000000 215.245561 215.245561 PGKYP 99H4
5597 PGKYP-99H4 2018-06-01 216.081794 56.676412 0.000000 0.0 0.000000 272.758206 272.758206 PGKYP 99H4

5598 rows × 11 columns

In [40]:
volume_per_well = well_volumes.pivot_table(index='START_DATE',columns='w_id',values='TOTAL')
volume_per_well.index = volume_per_well.index.to_period('M') 
volume_per_well.fillna(0,inplace=True)

volume_per_well
Out[40]:
w_id 100H3 101H1 102H1 103H1 104H1 105H1 106H1 107H1 108H1 109H1 ... 91H1 92H2 93H1 94H1 95H1 96H1 97H1 98H1 99H4 9H1
START_DATE
2012-01 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 98.508996 92.797510 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000
2012-02 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 73.654870 44.980243 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000
2012-03 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 90.344337 30.531579 -486.263803 0.000000 0.0 0.000000 0.000000 0.000000 0.000000
2012-04 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 32.317170 62.456635 -224.958078 0.000000 0.0 0.000000 0.000000 0.000000 0.000000
2012-05 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 61.045904 146.795815 -1.597162 0.000000 0.0 0.000000 0.000000 0.000000 0.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2018-02 87.503561 88.557881 8.087603 69.418812 69.202638 116.451333 415.682703 113.269137 62.651989 10.618387 ... 0.0 225.106641 0.000000 0.000000 56.441492 0.0 -551.883263 57.851462 270.348844 -43.411711
2018-03 93.356993 111.969004 72.046469 54.013385 81.091717 111.387840 489.566759 98.880905 42.399033 5.886228 ... 0.0 1281.201395 0.000000 0.000000 42.583627 0.0 -728.867278 67.266068 242.770378 -45.136445
2018-04 70.321192 146.648449 105.261215 131.280318 60.022853 103.897500 478.716780 95.301759 128.004958 2.792661 ... 0.0 79.818175 0.000000 0.000000 66.991164 0.0 -625.443884 100.423233 241.068006 -46.505471
2018-05 94.488789 85.225570 84.287279 65.438812 15.787108 90.053213 477.480582 109.932057 76.452366 66.631142 ... 0.0 6.627173 0.000000 0.000000 54.507179 0.0 -661.699576 109.087657 215.245561 -57.757394
2018-06 88.645946 80.611006 68.220110 77.297390 91.126259 107.649293 389.091333 89.366622 133.865833 219.125525 ... 0.0 0.000000 0.000000 0.000000 70.714813 0.0 -636.114055 102.098521 272.758206 -58.990993

78 rows × 106 columns

This table can be readily merged with the table of microseismic event counts per month that we created earlier (ms_fieldwide)

In [41]:
merged_final = pd.merge(ms_fieldwide, volume_per_well, left_index=True, right_index=True)
merged_final
Out[41]:
Events 100H3 101H1 102H1 103H1 104H1 105H1 106H1 107H1 108H1 ... 91H1 92H2 93H1 94H1 95H1 96H1 97H1 98H1 99H4 9H1
Date
2012-01 108 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 98.508996 92.797510 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000
2012-02 187 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 73.654870 44.980243 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000
2012-03 254 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 90.344337 30.531579 -486.263803 0.000000 0.0 0.000000 0.000000 0.000000 0.000000
2012-04 153 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 32.317170 62.456635 -224.958078 0.000000 0.0 0.000000 0.000000 0.000000 0.000000
2012-05 211 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0 61.045904 146.795815 -1.597162 0.000000 0.0 0.000000 0.000000 0.000000 0.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2018-02 86 87.503561 88.557881 8.087603 69.418812 69.202638 116.451333 415.682703 113.269137 62.651989 ... 0.0 225.106641 0.000000 0.000000 56.441492 0.0 -551.883263 57.851462 270.348844 -43.411711
2018-03 61 93.356993 111.969004 72.046469 54.013385 81.091717 111.387840 489.566759 98.880905 42.399033 ... 0.0 1281.201395 0.000000 0.000000 42.583627 0.0 -728.867278 67.266068 242.770378 -45.136445
2018-04 39 70.321192 146.648449 105.261215 131.280318 60.022853 103.897500 478.716780 95.301759 128.004958 ... 0.0 79.818175 0.000000 0.000000 66.991164 0.0 -625.443884 100.423233 241.068006 -46.505471
2018-05 53 94.488789 85.225570 84.287279 65.438812 15.787108 90.053213 477.480582 109.932057 76.452366 ... 0.0 6.627173 0.000000 0.000000 54.507179 0.0 -661.699576 109.087657 215.245561 -57.757394
2018-06 79 88.645946 80.611006 68.220110 77.297390 91.126259 107.649293 389.091333 89.366622 133.865833 ... 0.0 0.000000 0.000000 0.000000 70.714813 0.0 -636.114055 102.098521 272.758206 -58.990993

78 rows × 107 columns

Similar to before, we can use corr() to calculate the correlation coefficients between the volumes injected/produced by each well and the seismicity.

In [42]:
corr_per_well = merged_final.corr()
corr_per_well['labels'] = corr_per_well.columns # some cleaning up/formatting for plotting 
corr_per_well[corr_per_well == 1] = np.nan
corr_per_well
Out[42]:
Events 100H3 101H1 102H1 103H1 104H1 105H1 106H1 107H1 108H1 ... 92H2 93H1 94H1 95H1 96H1 97H1 98H1 99H4 9H1 labels
Events NaN -0.059970 -0.300151 -0.234067 -0.247627 -0.246544 0.067555 -0.032779 0.083697 -0.045027 ... -0.013121 -0.014842 0.248329 0.040396 0.182262 -0.064974 0.127183 0.079752 0.246636 Events
100H3 -0.059970 NaN 0.604029 0.638490 0.616372 0.515871 0.421964 0.181063 0.316510 0.449520 ... 0.049426 -0.208652 -0.497930 0.571159 -0.251694 -0.454075 0.572830 0.365528 -0.611396 100H3
101H1 -0.300151 0.604029 NaN 0.519669 0.434989 0.558508 0.412734 0.329245 0.263777 0.499152 ... 0.062154 -0.239236 -0.410969 0.318887 -0.215662 -0.381156 0.370537 0.312714 -0.662861 101H1
102H1 -0.234067 0.638490 0.519669 NaN 0.473220 0.254677 0.115446 -0.050443 0.278029 0.392322 ... -0.149097 -0.194941 -0.564981 0.524535 -0.421662 -0.134641 0.346431 0.160578 -0.663682 102H1
103H1 -0.247627 0.616372 0.434989 0.473220 NaN 0.342917 0.152451 -0.001739 -0.039831 0.101149 ... -0.121974 0.214616 -0.567073 0.091574 -0.220713 -0.047643 0.239278 0.028294 -0.638274 103H1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
96H1 0.182262 -0.251694 -0.215662 -0.421662 -0.220713 -0.223535 0.074001 0.123343 0.068993 -0.076600 ... 0.121630 -0.041025 0.478850 -0.216655 NaN 0.184596 -0.038180 0.130094 0.398423 96H1
97H1 -0.064974 -0.454075 -0.381156 -0.134641 -0.047643 -0.524235 -0.737846 -0.650346 -0.473561 -0.664569 ... -0.362397 0.582451 -0.045702 -0.516557 0.184596 NaN -0.663095 -0.649097 0.303498 97H1
98H1 0.127183 0.572830 0.370537 0.346431 0.239278 0.274037 0.755840 0.671557 0.773099 0.752226 ... 0.191082 -0.599127 0.081836 0.733832 -0.038180 -0.663095 NaN 0.803279 -0.515331 98H1
99H4 0.079752 0.365528 0.312714 0.160578 0.028294 0.187065 0.729373 0.759224 0.759519 0.802445 ... 0.394286 -0.613981 0.304525 0.555858 0.130094 -0.649097 0.803279 NaN -0.358669 99H4
9H1 0.246636 -0.611396 -0.662861 -0.663682 -0.638274 -0.391838 -0.383594 -0.282928 -0.379746 -0.481897 ... 0.007268 0.192123 0.436841 -0.424407 0.398423 0.303498 -0.515331 -0.358669 NaN 9H1

107 rows × 108 columns

In [43]:
px.bar(corr_per_well, x='labels', y='Events', labels={'Events': 'Correlation','labels': 'Well'}, title='Seismicity vs well volume').update_xaxes(categoryorder='total descending')

We can see that wells 33 and 24 have a relatively strong correlation between well rate and seismicity. To better understand what type of wells these are and where they are, we add the correlations to the wells spreadsheet.

In [44]:
wells_final = pd.merge(wells, corr_per_well[['Events']], left_index=True, right_index=True)
wells_final = wells_final.rename(columns = {'Events':'Correlation'}) # rename a column name
wells_final
Out[44]:
Type Name x y z w_id OIL WATER WATER_INJECTION STEAM_INJECTION INJECTED PRODUCED TOTAL Correlation
117H1 Steam injector PGKYP117H1 146.3084 -202.991 -2321.827151 117H1 0.000000 0.000000 0.0 9525.094246 9525.094246 0.000000 -9525.094246 0.057337
118H1 Steam injector PGKYP118H1 -227.6311 61.847 -2300.730740 118H1 0.000000 0.000000 0.0 2826.525173 2826.525173 0.000000 -2826.525173 0.128764
119H2 Steam injector PGKYP119H2 290.7779 -0.810 -2314.625180 119H2 0.000000 0.000000 0.0 9118.505463 9118.505463 0.000000 -9118.505463 0.045992
120H1 Steam injector PGKYP120H1 396.5311 519.811 -2312.246780 120H1 0.000000 0.000000 0.0 4176.984592 4176.984592 0.000000 -4176.984592 -0.030245
123H1 Steam injector PGKYP123H1 -689.5582 -15.152 -2306.458000 123H1 0.000000 0.000000 0.0 13897.880596 13897.880596 0.000000 -13897.880596 0.014979
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
99H4 Producer PGKYP99H4 -632.3642 -1446.487 -2393.273108 99H4 5836.563511 2386.582716 0.0 0.000000 0.000000 8223.146227 8223.146227 0.079752
121H1 Water injector PGKYP121H1 -632.8220 -95.544 -2098.473485 121H1 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 NaN
124H2 Water injector PGKYP124H2 -1301.7707 257.947 -2693.706605 124H2 0.000000 771.286964 0.0 0.000000 0.000000 771.286964 771.286964 -0.219553
40H2 Water injector PGKYP40H2 409.4545 -1031.604 -2590.693974 40H2 0.000000 386.234927 0.0 0.000000 0.000000 386.234927 386.234927 -0.077619
42H2 Water injector PGKYP42H2 -1750.9297 -732.331 -2562.187867 42H2 0.000000 7271.926400 0.0 0.000000 0.000000 7271.926400 7271.926400 0.016570

70 rows × 14 columns

We can again make the bar plot, and now color the bars by well type

In [45]:
px.bar(wells_final, x='w_id', y='Correlation', color='Type')

Or use facet_row to make subplots based on well type, and now color the bars based on total volume injected/produced.

In [46]:
px.bar(wells_final, x='w_id', y='Correlation', facet_row='Type', color='TOTAL')

Finally, we can plot again seismicity in 3-D and add the well locations, with:

  • symbol size scaled by the total volume injected/produced
  • symbol color indicating the correlation between injected/produced volume and seismicity.
In [47]:
fig=go.Figure()
fig.add_trace(px.scatter_3d(microseismic,x='Easting [m]', y='Northing [m]', z='Depth_SS [m]', size='Moment Magnitude', size_max=5, opacity=.5).data[0])
fig.add_trace(px.scatter_3d(wells_final, x='x', y='y', z='z', color='Correlation', size='INJECTED', hover_name='Name', size_max=50).data[0])
fig.add_trace(px.scatter_3d(wells_final, x='x', y='y', z='z', color='Correlation', size='PRODUCED', hover_name='Name', size_max=50).data[0])
fig.show()

All of the tables that we generated can be exported to Excel files using object_name.to_csv("filename.csv")

In [48]:
wells_final.to_csv("wells_final.csv")

🏁 Conclusions

Based on the available data, there is no clear relation between seismicity and injection/production. As per usual, we need more data.

In [ ]: